Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] Replica Validation: Refactors Code to Address Follow Ups #3929

Conversation

kundadebdatta
Copy link
Member

Pull Request Template

Description

Do not review this PR. This PR is meant to sneak peek the upcoming changes for the replica validation follow up works.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #IssueNumber

@kundadebdatta kundadebdatta self-assigned this Jun 20, 2023
@kundadebdatta kundadebdatta added the Do Not Review Marks a PR in "work in progress" state. label Jun 20, 2023
@@ -41,6 +41,7 @@ internal sealed class StoreClient : IStoreClient
bool useMultipleWriteLocations = false,
bool detectClientConnectivityIssues = false,
bool disableRetryWithRetryPolicy = false,
bool enableReplicaValidation = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute uses shared ServerStoreModel/Transport.
Thoughts on modeling options as request options?

/// unhealthy one. The default value for this parameter is false.
/// </summary>
/// <remarks>
/// <para>This is optimal for workloads where latency spikes are critical during upgrades.</para>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mentioning upgrades in not right?
General verbatim would be ideal or for now exclude it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Taken a note on this. I will remove this from the original PR.

/// <summary>
/// Gets or sets the prioritize healthy replicas flag.
/// Prioritizing healthy replicas helps the cosmos client to become more
/// resilient to connection timeouts, by choosing a healthy replica over an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Healthy or unhealthy is not defined and might be interpreted many ways.

#else
internal
#endif
CosmosClientBuilder WithPrioritizeHealthyReplicas(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did JAVA have it by-default? or public API?

Copy link
Member Author

@kundadebdatta kundadebdatta Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JAVA has it by the environment variables. I think it's enabled by default. Cc: @xinlian12

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is controlled by environment variable, and it is enabled by default.

@@ -47,6 +47,7 @@ internal class GatewayAddressCache : IAddressCache, IDisposable

private readonly CosmosHttpClient httpClient;
private readonly bool isReplicaAddressValidationEnabled;
private static readonly TimeSpan WarmupCacheAndOpenConnectionTimeout = TimeSpan.FromMinutes(40);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 40 minutes? Why not leverage the incoming cancelaltion token

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are leveraging the incoming cancellation token. The way it is done is, we add a Task.Delay(40mins, cancellationToken) and wait for the cancellation token to expire. The 40 mins here is just a place holder number to define the max wait time for opening up the connections. This can be polished later.

image

See this example for more detail around the approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incoming cancellation can be non-existent, so we need the LinkedTokenSource to guard against the process taking more than some expected time through a Task.Delay. The correct way to cooperate with a Task.Delay is what Deb is doing, through the linkedTokenSource

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the incoming token cancels before, then it will cancel also the linked one, which cancels all Tasks. The linked token is also used to cancel the Task.Delay if the warmup finishes before, which is the expected scenario. Otherwise the Task.Delay keeps running in the back.

@kundadebdatta
Copy link
Member Author

Draft PR. Closing as the original issue has been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Do Not Review Marks a PR in "work in progress" state.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants